30 research outputs found

    Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming

    Full text link
    Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200

    Towards Loosely-Coupled Programming on Petascale Systems

    Full text link
    We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

    Performance Analysis of a Parallel Discrete Model for the Simulation of Laser Dynamics

    Get PDF
    This paper presents an analysis on the performance of a parallel implementation of a discrete model of laser dynamics, which is based on cellular automata. The performance of a 2D parallel version of the model is studied as a rst step to test the feasibility of a parallel 3D version, which is needed to simulate speci c laser systems. The 3D version will have to run on a parallel computer due to its runtime and memory requirements. The model has been implemented on a Beowulf Cluster using the message passing paradigm. The parallel implementation is found to exhibit a good speedup, allowing us to run realistic simulations of laser systems on clusters of workstations, which could not be afforded on an individual machine due to the extensive runtime and memory size needed.Ministerio de Educación y Ciencia TIC2002-04498-C05-0

    Parallel implementation of a cellular automaton model for the simulation of laser dynamics

    Get PDF
    A parallel implementation for distributed-memory MIMD systems of a 2D discrete model of laser dynamics based on cellular au- tomata is presented. The model has been implemented on a PC cluster using a message passing library. A good performance has been obtained, allowing us to run realistic simulations of laser systems in clusters of workstations, which could not be a orded on an individual machine due to the extensive runtime and memory size needed.Ministerio de Educación y Ciencia TIN2005-08818-C04-0

    Parallel Cellular Automata-based Simulation of Laser Dynamics using Dynamic Load Balancing

    Get PDF
    We present an analysis of the feasibility of executing a parallel bioinspired model of laser dynamics, based on cellular automata (CA), on the usual target platform of this kind of applications: a heterogeneous non-dedicated cluster. As this model employs a synchronous CA, using the single program, multiple data (SPMD) paradigm, it is not clear in advance if an appropriate efficiency can be obtained on this kind of platform. We have evaluated its performance including artificial load to simulate other tasks or jobs submitted by other users. A dynamic load balancing strategy with two main differences from most previous implementations of CA based models has been used. First, it is possible to migrate load to cluster nodes initially not belonging to the pool. Second, a modular approach is taken in which the model is executed on top of a dynamic load balancing tool – the Dynamite system – gaining flexibility. Very satisfactory results have been obtained, with performance increases from 60% to 80%.Ministerio de Ciencia e Innovación TIN2007-68083-C02Junta de Extremadura PRI06A22

    SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

    Full text link
    CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.Comment: 14 pages, 15 figures, 5 tables, submitted to VLDB '2

    The inferior gluteal artery anatomy: a detailed analysis with implications for plastic and reconstructive surgery

    Get PDF
    Background: The inferior gluteal artery (IGA) is a large terminal branch of the anterior division of the internal iliac artery (ADIIA). There is a significant lack of data regarding the variable anatomy of the IGA. Materials and methods: A retrospective study was conducted to establish anatomical variations, their prevalence and morphometrical data on IGA and its branches. The results of 75 consecutive patients who underwent pelvic computed tomography angiography (CTA) were analyzed. Results: The origin variation of each IGA was deeply analyzed. Four origin variations have been observed. The most common Type O1 occurred in 86 of the studied cases (62.3%).  The median IGA length was set to be 68.50 mm (LQ = 54.29 ; HQ = 86.06). The median distance from the origin of the ADIIA to the origin of the IGA was set to be 38.22 mm (LQ = 20.22; HQ = 55.97). The median origin diameter of the IGA was established at 4.69 mm (LQ = 4.13; HQ = 5.45). Conclusions: The present study thoroughly analyzed the complete anatomy of the IGA and the branches of the ADIIA. A novel classification system for the origin of the IGA was created, where the most prevalent origin was from the ADIIA (Type 1; 62.3%). Furthermore, the morphometric properties (such as the diameter and length) of the branches of the ADIIA were analyzed. This data may be incredibly useful for physicians performing operations in the pelvis, such as interventional intraarterial procedures or various gynecological surgeries
    corecore